37 research outputs found

    Unsupervised Visual and Textual Information Fusion in Multimedia Retrieval - A Graph-based Point of View

    Full text link
    Multimedia collections are more than ever growing in size and diversity. Effective multimedia retrieval systems are thus critical to access these datasets from the end-user perspective and in a scalable way. We are interested in repositories of image/text multimedia objects and we study multimodal information fusion techniques in the context of content based multimedia information retrieval. We focus on graph based methods which have proven to provide state-of-the-art performances. We particularly examine two of such methods : cross-media similarities and random walk based scores. From a theoretical viewpoint, we propose a unifying graph based framework which encompasses the two aforementioned approaches. Our proposal allows us to highlight the core features one should consider when using a graph based technique for the combination of visual and textual information. We compare cross-media and random walk based results using three different real-world datasets. From a practical standpoint, our extended empirical analysis allow us to provide insights and guidelines about the use of graph based methods for multimodal information fusion in content based multimedia information retrieval.Comment: An extended version of the paper: Visual and Textual Information Fusion in Multimedia Retrieval using Semantic Filtering and Graph based Methods, by J. Ah-Pine, G. Csurka and S. Clinchant, submitted to ACM Transactions on Information System

    Probabilistic Models of Document Collections

    No full text
    Nous nous intéressons à la fois à la modélisation des fréquences des mots dans les collections textuelles et aux modèles probabilistes de recherche d'information (RI). Concernant les modèles statistiques de fréquences de mots, nous portons notre attention sur l'étude du phénomène de rafale (burstiness). Nous établissons une propriété sur les distributions de probabilité caractérisant leur capacité à modéliser ce phénomène et nous étudions ensuite les distributions Beta Negative Binomial et Log-Logistique pour la modélisation des fréquences de mots. Nous portons ensuite notre attention sur les modèles probabilistes de RI et leur propriétés fondamentales. Nous pouvons montrer que les modèles classiques ne reposent pas sur des lois de probabilité en rafale, même si des propriétés fondamentales comme la concavité des modèles permettent implicitement de le prendre en compte. Nous introduisons ensuite une nouvelle famille de modèles probabiliste pour la recherche d'information, fondé sur la notion d'information de Shannon et qui permet d'établir un lien conséquent entre les propriétés importantes des modèles de RI et le phénomène de rafale. Enfin, nous étudions empiriquement et théoriquement les modèles de rétro-pertinence. Nous proposons un cadre théorique qui permet ainsi d'expliquer leurs caractéristiques empiriques et leur performances. Ceci permet entre autres de mettre en avant les propriétés importantes des modèles de retro-pertinence et de montrer que certains modèles de référence sont déficients.The present study deals with word frequencies distributions and their relation to probabilistic Information Retrieval (IR) models. We examine the burstiness phenomenon of word frequencies in textual collections. We propose to model this phenomenon as a property of probability distributions and we study the Beta Negative Binomial and Log-Logistic distributions to model word frequencies. We then focus on probabilistic IR models and their fundamental properties. Our analysis reveals that probability distributions underlying most state-of-the-art models do not take this phenomenon into account , even if fundamental properties of IR models such as concavity enable implicitly to take it into account. We then introduce a novel family of probabilistic IR model, based on Shannon information. These new models bridge the gap between significant properties of IR models and the burstiness phenomenon of word frequencies. Lastly, we study empirically and theoretically pseudo relevance feedback models. We propose a theoretical framework which explain well the empirical behaviour and performance of pseudo relevance feedback models. Overall, this highlights interesting properties for pseudo relevance feedback and shows that some state-of-the-art model are inadequate

    Modèles probabilistes pour les fréquences de mots et la recherche d'information

    No full text
    The present study deals with word frequencies distributions and their relation to probabilistic Information Retrieval (IR) models. We examine the burstiness phenomenon of word frequencies in textual collections. We propose to model this phenomenon as a property of probability distributions and we study the Beta Negative Binomial and Log-Logistic distributions to model word frequencies. We then focus on probabilistic IR models and their fundamental properties. Our analysis reveals that probability distributions underlying most state-of-the-art models do not take this phenomenon into account , even if fundamental properties of IR models such as concavity enable implicitly to take it into account. We then introduce a novel family of probabilistic IR model, based on Shannon information. These new models bridge the gap between significant properties of IR models and the burstiness phenomenon of word frequencies. Lastly, we study empirically and theoretically pseudo relevance feedback models. We propose a theoretical framework which explain well the empirical behaviour and performance of pseudo relevance feedback models. Overall, this highlights interesting properties for pseudo relevance feedback and shows that some state-of-the-art model are inadequate.Nous nous intéressons à la fois à la modélisation des fréquences des mots dans les collections textuelles et aux modèles probabilistes de recherche d'information (RI). Concernant les modèles statistiques de fréquences de mots, nous portons notre attention sur l'étude du phénomène de rafale (burstiness). Nous établissons une propriété sur les distributions de probabilité caractérisant leur capacité à modéliser ce phénomène et nous étudions ensuite les distributions Beta Negative Binomial et Log-Logistique pour la modélisation des fréquences de mots. Nous portons ensuite notre attention sur les modèles probabilistes de RI et leur propriétés fondamentales. Nous pouvons montrer que les modèles classiques ne reposent pas sur des lois de probabilité en rafale, même si des propriétés fondamentales comme la concavité des modèles permettent implicitement de le prendre en compte. Nous introduisons ensuite une nouvelle famille de modèles probabiliste pour la recherche d'information, fondé sur la notion d'information de Shannon et qui permet d'établir un lien conséquent entre les propriétés importantes des modèles de RI et le phénomène de rafale. Enfin, nous étudions empiriquement et théoriquement les modèles de rétro-pertinence. Nous proposons un cadre théorique qui permet ainsi d'expliquer leurs caractéristiques empiriques et leur performances. Ceci permet entre autres de mettre en avant les propriétés importantes des modèles de retro-pertinence et de montrer que certains modèles de référence sont déficients

    Information-based models for ad hoc IR

    No full text
    We introduce in this paper the family of information-based models for ad hoc information retrieval. These models draw their inspiration from a long-standing hypothesis in IR, namely the fact that the difference in the behaviors of a word at the document and collection levels brings information on the significance of the word for the document. This hypothesis has been exploited in the 2-Poisson mixture models, in the notion of eliteness in BM25, and more recently in DFR models. We show here that, combined with notions related to burstiness, it can lead to simpler and better models

    The Beta-Negative Binomial for Text Modeling

    No full text
    International audienceno abstrac

    A Theoretical Analysis of Pseudo-Relevance Feedback Models

    No full text
    International audienceOur goal in this study is to compare several widely used pseudo-relevance feedback (PRF) models and understand what explains their respective behavior. To do so, we first analyze how different PRF models behave through the characteristics of the terms they select and through their performance on two widely used test collections. This analysis reveals that several well-known models surprisingly tend to select very common terms, with low IDF (inverse document frequency). We then introduce several conditions PRF models should satisfy regarding both the terms they select and the way they weigh them, prior to study whether standard PRF models satisfy these conditions or not. This study reveals that most models are deficient with respect to at least one condition, and that this deficiency explains the results of our analysis of the behavior of the models, as well as some of the results reported on the respective performance of PRF models. Based on the PRF conditions, we finally propose possible corrections for the simple mixture model. The PRF models obtained after these corrections outperform their standard version and yield state-of-the-art PRF models which confirms the validity of our theoretical analysis
    corecore